Engineering posts about Large Language Models

Curated summaries and key learnings for engineers working with Large Language Models.

Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks

The article outlines the significance of prompt caching in accelerating inference for large language models (LLMs) on Databricks. It explains how repeated prompts can lead to inefficiencies in...

Databricks

Databricks for Good and Virtue Foundation: Partnering to Connect Medical Volunteers to Critical Health Services in 72 Countries

The article outlines the collaboration between Databricks for Good and the Virtue Foundation to enhance global health delivery through an AI-enabled platform. The initiative focuses on aggregating...

Databricks

How to safeguard AI workloads with Unity AI Gateway Guardrails

The article outlines the importance of implementing guardrails in AI applications to protect sensitive information and ensure compliance with security standards. It details how Unity AI Gateway...

Databricks

Databricks context engineer associate: the industry’s first certification for reliable AI agent systems

The article introduces the Databricks Certified Context Engineer Associate certification, the first of its kind aimed at enhancing the reliability of AI agent systems through effective context...

Salesforce

Creating a Multi-Tenant AI Agent Platform Handling 7K+ Sessions Without Cross-Team Interference

The article outlines the development of the Bring Your Own Planner (BYOP), a multi-tenant AI agent platform designed to enhance team autonomy and scalability within Salesforce. It addresses the...

AWS

Amazon Bedrock introduces new advanced prompt optimization and migration tool

Amazon Bedrock has introduced an advanced prompt optimization tool that allows users to enhance their prompts for various models simultaneously. This tool facilitates migration to new models or...

Google

13m

Build Long-running AI agents that pause, resume, and never lose context with ADK

This article presents a comprehensive guide to building long-running AI agents that can pause, resume, and maintain context using the Agent Development Kit (ADK). It highlights the limitations of...

Databricks

Pushing the Frontier for Data Agents with Genie

The article presents Genie, a sophisticated data agent developed by Databricks, designed to enhance the analysis of both structured and unstructured enterprise data. It highlights the challenges...

Databricks

How Superhuman and Databricks built a 200K QPS inference platform together

The article describes the collaboration between Superhuman and Databricks in developing a high-performance inference platform capable of handling over 200,000 queries per second (QPS) with stringent...

Apple

From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs

The paper introduces the Spatial-Functional Intelligence Benchmark (SFI-Bench), aimed at evaluating the advanced reasoning capabilities of multimodal large language models (MLLMs). It highlights the...

Databricks

28m

Generative AI for Business: A Complete Strategy and Implementation Guide

The article discusses the transformative potential of generative AI in business, highlighting its ability to create significant economic value across various sectors. It emphasizes the importance of...

Databricks

11m

LLM Vs AI: A Practical Guide to Differences, Use Cases, and Tools

This article serves as a comprehensive guide to understanding the distinctions between large language models (LLMs) and the broader field of artificial intelligence (AI). It outlines the scope, core...

Google

12m

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

The article discusses advancements in Large Language Model (LLM) inference acceleration through the implementation of block diffusion speculative decoding, specifically the DFlash method, on Google...

Apple

Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents

The article introduces the concept of a Reinforced Agent that enhances tool-calling agents by incorporating inference-time feedback. This approach aims to address the limitations of traditional...

Apple

LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning

The paper introduces LaDiR (Latent Diffusion Reasoner), a novel framework that enhances the reasoning capabilities of large language models (LLMs) by integrating latent diffusion models. It addresses...

Apple

Adaptive Thinking: Large Language Models Know When to Think in Latent Space

The article presents research on adaptive thinking in large language models (LLMs), particularly focusing on how these models can optimize their reasoning processes during inference. It introduces...

DigitalOcean

DigitalOcean Dedicated Inference: A Technical Deep Dive

The article delves into DigitalOcean's Dedicated Inference service, designed to efficiently manage large language model (LLM) inference at scale. It highlights the challenges of handling high...

Apple

11m

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel

The article presents ParaRNN, a novel framework developed by Apple researchers that significantly enhances the training efficiency of Recurrent Neural Networks (RNNs) by enabling parallelization....

Databricks

22m

A Practical Guide to LLM Fine Tuning

This article serves as a practical guide for ML engineers and AI practitioners focused on fine-tuning large language models (LLMs) for specific tasks. It outlines the entire lifecycle of LLM...

Databricks

Are LLM agents good at join order optimization?

This article explores the innovative application of large language models (LLMs) in improving join order optimization in SQL queries, a long-standing challenge in database management. Traditional...